2025-04-10 c0082ba860aa21f580c5200799dc24b4 99+ 29 分钟 4.4 k0次访问

如何使用 eBPF 监控 Linux 内存 OOM killer：Linux 内存调优之 eBPF 监控内存 OOM killer 事件

知不可乎骤得，托遗响于悲风 —《赤壁赋》

写在前面

博文内容涉及使用 eBPF 监控内存 OOM killer 事件，并且采集当前系统的部分相关指标数据
介绍了传统的监控方式以及使用 BPF/eBPF 的方式
关于 OOM killer 是什么，以及对应的内核调优参数，博客没有涉及
理解不足小伙伴帮忙指正 :),生活加油

知不可乎骤得，托遗响于悲风 —《赤壁赋》

持续分享技术干货，感兴趣小伙伴可以关注下 ^_^

下面实验用的 Linux 环境

[root@developer ~]# hostnamectl 
 Static hostname: developer
       Icon name: computer-vm
         Chassis: vm
      Machine ID: 7ad73f2b5f7046a2a389ca780f472467
         Boot ID: cef15819a5c34efa92443b6eff608cc9
  Virtualization: kvm
Operating System: openEuler 22.03 (LTS-SP4)
          Kernel: Linux 5.10.0-250.0.0.154.oe2203sp4.aarch64
    Architecture: arm64
 Hardware Vendor: OpenStack Foundation
  Hardware Model: OpenStack Nova
[root@developer ~]#

下面我们谈到的 BPF 或者 eBPF 代指整个 BPF/eBPF 技术

OOM Killer 事件: OOM Killer（Out-Of-Memory Killer）是内核在系统内存严重不足时触发的紧急机制，通过终止进程释放内存以维持系统稳定，每个进程有一个 OOM 相关的分数，终止进程的时候基于这个分数进行处理，有一些内核参数可以控制 OOM Killer 的行为，生产中考虑QOS可以进行相关的配置，当然更合理的方式是使用Cgroup对不同进程的内存资源进行限制，这里不多讲，包括 OOM killer 打分机制等等感兴趣的小伙伴可以了解下。

传统的 OOM Killer 内存事件监控

传统的 OOM killer 历史数据查看一般通过内核日志，或者是Cgroup 内存子系统的事件计数器。

Cgroup 内存子系统有 OOM 相关的事件统计， memory.events 指标，是一个内存事件计数器：

┌──[root@liruilongs.github.io]-[/usr/lib/systemd/system] 
└─$cat /sys/fs/cgroup/memory/system.slice/tuned.service/memory.events
low 0
high 0
limit_in_bytes 0
oom 0
┌──[root@liruilongs.github.io]-[/usr/lib/systemd/system] 
└─$

具体的参数指标说明：

low: 低内存压力事件次数
high: 高内存压力事件次数
limit_in_bytes: 达到内存限制的次数
oom: OOM（内存耗尽）触发次数。全为 0 表示无相关事件发生。

内核日志dmesg 可以显示详细的 OOM killer 进程相关数据

下面的日志：系统因内存耗尽触发了 OOM Killer，终止了 stress-ng 进程（PID 39693）

[root@liruilongs.github.io ~]# dmesg -T | grep -A 30  -i "Killed process 39693"
[日 5月 11 15:41:12 2025] Out of memory: Killed process 39693 (stress-ng) total-vm:2410396kB, anon-rss:1896300kB, file-rss:4kB, shmem-rss:60kB, UID:0 pgtables:3772kB oom_score_adj:1000
[日 5月 11 15:41:13 2025] stress-ng invoked oom-killer: gfp_mask=0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=1000
[日 5月 11 15:41:13 2025] CPU: 0 PID: 39692 Comm: stress-ng Kdump: loaded Not tainted 5.10.0-250.0.0.154.oe2203sp4.aarch64 #1
[日 5月 11 15:41:13 2025] Hardware name: OpenStack Foundation OpenStack Nova, BIOS 0.0.0 02/06/2015
[日 5月 11 15:41:13 2025] Call trace:
[日 5月 11 15:41:13 2025]  dump_backtrace+0x0/0x214
[日 5月 11 15:41:13 2025]  show_stack+0x20/0x2c
[日 5月 11 15:41:13 2025]  dump_stack+0xf0/0x138
[日 5月 11 15:41:13 2025]  dump_header+0x50/0x1b0
[日 5月 11 15:41:13 2025]  oom_kill_process+0x258/0x270
[日 5月 11 15:41:13 2025]  out_of_memory+0xf4/0x3b0
[日 5月 11 15:41:13 2025]  __alloc_pages+0x1024/0x1214
[日 5月 11 15:41:13 2025]  alloc_pages_vma+0xb4/0x3e0
[日 5月 11 15:41:13 2025]  do_anonymous_page+0x1d4/0x784
[日 5月 11 15:41:13 2025]  handle_pte_fault+0x19c/0x240
[日 5月 11 15:41:13 2025]  __handle_mm_fault+0x1bc/0x3ac
[日 5月 11 15:41:13 2025]  handle_mm_fault+0xf4/0x260
[日 5月 11 15:41:13 2025]  do_page_fault+0x184/0x464
[日 5月 11 15:41:13 2025]  do_translation_fault+0xb8/0xe4
[日 5月 11 15:41:13 2025]  do_mem_abort+0x48/0xc0
[日 5月 11 15:41:13 2025]  el0_da+0x44/0x80
[日 5月 11 15:41:13 2025]  el0_sync_handler+0x68/0xc0
[日 5月 11 15:41:13 2025]  el0_sync+0x160/0x180
[日 5月 11 15:41:13 2025] Mem-Info:
[日 5月 11 15:41:13 2025] active_anon:4575 inactive_anon:1652614 isolated_anon:0
                             active_file:21 inactive_file:34 isolated_file:0
                             unevictable:1551 dirty:0 writeback:0
                             slab_reclaimable:5171 slab_unreclaimable:10557
                             mapped:2877 shmem:10220 pagetables:6130 bounce:0
                             free:14963 free_pcp:25 free_cma:0
[日 5月 11 15:41:13 2025] Node 0 active_anon:18300kB inactive_anon:6610456kB active_file:76kB inactive_file:132kB unevictable:6204kB isolated(anon):0kB isolated(file):0kB mapped:11508kB dirty:0kB writeback:0kB shmem:40880kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 3829760kB writeback_tmp:0kB kernel_stack:6384kB all_unreclaimable? yes
[root@liruilongs.github.io ~]#

从 dmesg 日志来看，系统因内存耗尽触发了 OOM Killer，终止了 stress-ng 进程（PID 39693）,内核日志可以看到详细的数据信息，

下面是日志中内存触发条件，也就是相关的第一条日志

1
2
3

[日 5月 11 15:41:12 2025] Out of memory: Killed process 39693 (stress-ng) 
total-vm:2410396kB, anon-rss:1896300kB, file-rss:4kB, shmem-rss:60kB, 
UID:0 pgtables:3772kB oom_score_adj:1000

这条日志对应字段的含义：

total-vm:2410396kB：进程申请的总虚拟内存为 2.4GB（含物理内存和交换空间）
anon-rss:1896300kB：实际占用的匿名内存（堆/栈）为 1.8GB, rss 说明是驻留的物理内存
file-rss:4kB: 该进程当前通过文件映射占用了 4KB 物理内存(进程通过 mmap() 映射文件)
shmem-rss:60kB : 该进程当前通过共享内存占用了 60KB 物理内存(共享内存通常通过 shmget()或tmpfs`（如 /dev/shm）实现)
UID:0 : 0 表示 root 用户
pgtables:3772kB: 该进程的页表占用了约 3.7MB 内存
oom_score_adj:1000：进程的 OOM 评分为最高（1000），因此被内核选为牺牲者。

下面的是对应的函数调用栈：

[日 5月 11 15:41:13 2025] Call trace:
[日 5月 11 15:41:13 2025]  dump_backtrace+0x0/0x214
[日 5月 11 15:41:13 2025]  show_stack+0x20/0x2c
[日 5月 11 15:41:13 2025]  dump_stack+0xf0/0x138
[日 5月 11 15:41:13 2025]  dump_header+0x50/0x1b0
[日 5月 11 15:41:13 2025]  oom_kill_process+0x258/0x270
[日 5月 11 15:41:13 2025]  out_of_memory+0xf4/0x3b0
[日 5月 11 15:41:13 2025]  __alloc_pages+0x1024/0x1214   # 尝试分配物理页失败
[日 5月 11 15:41:13 2025]  alloc_pages_vma+0xb4/0x3e0
[日 5月 11 15:41:13 2025]  do_anonymous_page+0x1d4/0x784  # 匿名页分配失败
[日 5月 11 15:41:13 2025]  handle_pte_fault+0x19c/0x240   # 页表项错误处理
[日 5月 11 15:41:13 2025]  __handle_mm_fault+0x1bc/0x3ac
[日 5月 11 15:41:13 2025]  handle_mm_fault+0xf4/0x260
[日 5月 11 15:41:13 2025]  do_page_fault+0x184/0x464
[日 5月 11 15:41:13 2025]  do_translation_fault+0xb8/0xe4
[日 5月 11 15:41:13 2025]  do_mem_abort+0x48/0xc0
[日 5月 11 15:41:13 2025]  el0_da+0x44/0x80
[日 5月 11 15:41:13 2025]  el0_sync_handler+0x68/0xc0
[日 5月 11 15:41:13 2025]  el0_sync+0x160/0x180
...

函数调用栈下部分为内存分配失败触发进程 OOM 以及 oom_kill_process 的调用触发内存杀手，后面部分为一些日志转储的操作。

下面为发生 OOM Kill 时候系统全局内存状态

[日 5月 11 15:41:13 2025] Mem-Info:
[日 5月 11 15:41:13 2025] active_anon:4575 inactive_anon:1652614 isolated_anon:0
                             active_file:21 inactive_file:34 isolated_file:0
                             unevictable:1551 dirty:0 writeback:0
                             slab_reclaimable:5171 slab_unreclaimable:10557
                             mapped:2877 shmem:10220 pagetables:6130 bounce:0
                             free:14963 free_pcp:25 free_cma:0

在内核日志中看到发生异常是的内存具体指标数据

active_anon:4575kB    # 活跃的匿名内存（进程堆/栈等动态分配的内存）
inactive_anon:1652614kB # 不活跃的匿名内存（长期未使用的堆/栈）
isolated_anon:0kB     # 被隔离的匿名内存（通常为内存故障隔离
active_file:21kB      # 活跃的文件映射内存（如页缓存、打开的文件）
inactive_file:34kB    # 不活跃的文件映射内存
isolated_file:0kB     # 被隔离的文件映射内存
unevictable:1551kB    # 无法被交换或回收的内存（如 mlock 锁定的内存）
slab_reclaimable:5171kB  # 可回收的 Slab 缓存（如内核对象池）
slab_unreclaimable:10557kB # 不可回收的 Slab 缓存（如内核数据结构）
shmem:10220kB         # 共享内存（如 tmpfs、IPC 通信）
mapped:2877kB         # 文件映射内存（如共享库、内存映射文件）
pagetables:6130kB     # 进程虚拟地址到物理地址的映射表
free:14963kB          # 完全空闲的物理内存
free_pcp:25kB         # 每 CPU 空闲内存（用于本地分配）
free_cma:0kB          # CMA（连续内存分配器）空闲区

同时会输出对应的 NUMA 节点 0 的内存指标，可以看到几乎完全耗尽（尤其是匿名内存），且透明大页占用显著（anon_thp: 3829760kB），加剧了内存碎片化问题

[日 5月 11 15:41:13 2025] Node 0 active_anon:18300kB inactive_anon:6610456kB active_file:76kB inactive_file:132kB unevictable:6204kB isolated(anon):0kB isolated(file):0kB mapped:11508kB dirty:0kB writeback:0kB shmem:40880kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 3829760kB writeback_tmp:0kB kernel_stack:6384kB all_unreclaimable? yes

使用 BPF 的方式

对于 BPF 的监控，主要通过 BPF 和 bpftrace 的 oomkill 工具，我们可以在触发 OOM killer 事件之后，观察到系统平均负载等一些其他的信息，原理是通过动态插桩内核函数 oom_kill_process()，捕获 OOM Killer 触发事件

比如平均负载信息可以在 OOM 发生时提供整个系统状态的一些 上下文信息，展示出系统整体是正在变忙还是处于稳定状态，以及那个进程触发了 OOM Killer 和，被 OOM Killer 杀掉的进程是那个等数据。

用内存测试工具简单复现一下 OOM killer，我们看看如何监控，这里需要把交换分区禁用掉，要不换页进程(kswapd)疯狂的输出，不太容易触发 OOM Killer

1	[root@liruilongs.github.io ~]# swapoff -a # 临时禁用

stress-ng 对 Linux 系统内存施加高压负载

[root@liruilongs.github.io ~]# stress-ng --vm 4 --vm-bytes 9.5G  --timeout 60s
stress-ng: info:  [37336] setting to a 60 second run per stressor
stress-ng: info:  [37336] dispatching hogs: 4 vm
^[c^Cstress-ng: info:  [37336] successful run completed in 40.87s
[root@liruilongs.github.io ~]#

通过 free 命令观察内存使用情况，中间的那一次输出可以直观的看到内存使用情况

[root@liruilongs.github.io ~]# free -h -s 10  -c 3
               total        used        free      shared  buff/cache   available
Mem:           6.5Gi       815Mi       5.6Gi        37Mi       293Mi       5.7Gi
Swap:             0B          0B          0B

               total        used        free      shared  buff/cache   available
Mem:           6.5Gi       6.4Gi       183Mi        39Mi       113Mi       139Mi
Swap:             0B          0B          0B

               total        used        free      shared  buff/cache   available
Mem:           6.5Gi       4.5Gi       2.0Gi        39Mi        84Mi       2.0Gi
Swap:             0B          0B          0B
[root@liruilongs.github.io ~]#

通过 oomkill 工具观察 OOM Killer 情况

内存分配失败调用栈，上面的 BPF 工具实际上是在 oom_kill_process 内核函数处埋点实现的

可以看到触发的进程主要是 stress-ng（内存压力测试工具）持续申请内存，导致系统物理内存耗尽。部分系统进程（如 oeaware、Xvnc）也触发 OOM，说明内存竞争激烈，系统整体处于高压状态。通过负载指标：loadavg 值较高（如 4.59），表明 CPU 资源负载在升高。

[root@liruilongs.github.io ~]# /usr/share/bcc/tools/oomkill 
Tracing OOM kills... Ctrl-C to stop.
15:41:14 Triggered by PID 1039 ("oeaware"), OOM kill of PID 39693 ("stress-ng"), 1704429 pages, loadavg: 4.34 2.87 1.77 6/396 39695
15:41:15 Triggered by PID 39692 ("stress-ng"), OOM kill of PID 39692 ("stress-ng"), 1704429 pages, loadavg: 4.34 2.87 1.77 5/396 39696
15:41:16 Triggered by PID 39696 ("stress-ng"), OOM kill of PID 39694 ("stress-ng"), 1704429 pages, loadavg: 4.31 2.89 1.78 5/396 39697
15:41:17 Triggered by PID 39698 ("stress-ng"), OOM kill of PID 39695 ("stress-ng"), 1704429 pages, loadavg: 4.31 2.89 1.78 5/396 39699
15:41:19 Triggered by PID 1039 ("oeaware"), OOM kill of PID 39696 ("stress-ng"), 1704429 pages, loadavg: 4.31 2.89 1.78 5/396 39700
15:41:20 Triggered by PID 2121 ("ibus-ui-gtk3"), OOM kill of PID 39697 ("stress-ng"), 1704429 pages, loadavg: 4.31 2.89 1.78 6/396 39701
15:41:22 Triggered by PID 39699 ("stress-ng"), OOM kill of PID 39698 ("stress-ng"), 1704429 pages, loadavg: 4.29 2.91 1.80 5/396 39701
15:41:23 Triggered by PID 39700 ("stress-ng"), OOM kill of PID 39700 ("stress-ng"), 1704429 pages, loadavg: 4.29 2.91 1.80 6/396 39702
15:41:24 Triggered by PID 39701 ("stress-ng"), OOM kill of PID 39699 ("stress-ng"), 1704429 pages, loadavg: 4.29 2.91 1.80 5/396 39704
15:41:25 Triggered by PID 39702 ("stress-ng"), OOM kill of PID 39701 ("stress-ng"), 1704429 pages, loadavg: 4.29 2.91 1.80 5/396 39704
15:41:26 Triggered by PID 39703 ("stress-ng"), OOM kill of PID 39702 ("stress-ng"), 1704429 pages, loadavg: 4.59 3.00 1.83 5/396 39705
15:41:27 Triggered by PID 39705 ("stress-ng"), OOM kill of PID 39703 ("stress-ng"), 1704429 pages, loadavg: 4.59 3.00 1.83 5/396 39706
15:41:29 Triggered by PID 1304 ("Xvnc"), OOM kill of PID 39704 ("stress-ng"), 1704429 pages, loadavg: 4.59 3.00 1.83 6/396 39708
15:41:30 Triggered by PID 1492 ("lightdm-gtk-gre"), OOM kill of PID 39705 ("stress-ng"), 1704429 pages, loadavg: 4.59 3.00 1.83 5/395 39708
15:41:31 Triggered by PID 39709 ("stress-ng"), OOM kill of PID 39706 ("stress-ng"), 1704429 pages, loadavg: 4.94 3.10 1.87 8/395 39710

看下一下输出的指标信息,已第一条日志为例

1	15:41:14 Triggered by PID 1039 ("oeaware"), OOM kill of PID 39693 ("stress-ng"), 1704429 pages, loadavg: 4.34 2.87 1.77 6/396 39695。

字段	含义
`Triggered by PID`	触发 OOM 的进程 PID（如内存申请者）
`OOM kill of PID`	被 OOM Killer 终止的进程 PID
`1704429 pages`	被终止进程占用的物理内存页数（1页=4KB，换算为 6.8GB）
`loadavg`	系统负载（1分钟/5分钟/15分钟平均负载）
`6/396`	当前可运行进程数/总进程数
`39695`	最后被创建的进程 PID

当然上面的输出的功能有些简单，如果我们希望获取更多的数据信息，我们可以通过修改原来脚本的方式实现

bpftrace 对应的脚本

https://github.com/brendangregg/bpf-perf-tools-book/blob/master/originals/Ch07_Memory/oomkill.bt

[root@liruilongs.github.io ~]# cat  /usr/share/bpftrace/tools/oomkill.bt 
#!/usr/bin/bpftrace
/*
 * oomkill Trace OOM killer.
 *  For Linux, uses bpftrace and eBPF.
 *
 * This traces the kernel out-of-memory killer, and prints basic details,
 * including the system load averages. This can provide more context on the
 * system state at the time of OOM: was it getting busier or steady, based
 * on the load averages? This tool may also be useful to customize for
 * investigations; for example, by adding other task_struct details at the
 * time of the OOM, or other commands in the system() call.
 *
 * This currently works by using kernel dynamic tracing of oom_kill_process().
 *
 * USAGE: oomkill.bt
 *
 * Copyright 2018 Netflix, Inc.
 * Licensed under the Apache License, Version 2.0 (the "License")
 *
 * 07-Sep-2018 Brendan Gregg Created this.
 */

#include <linux/oom.h>

BEGIN
{
 printf("Tracing oom_kill_process()... Hit Ctrl-C to end.\n");
}

kprobe:oom_kill_process
{
 $oc = (struct oom_control *)arg0;
 time("%H:%M:%S ");
 printf("Triggered by PID %d (\"%s\"), ", pid, comm);
 printf("OOM kill of PID %d (\"%s\"), %d pages, loadavg: ",
     $oc->chosen->pid, $oc->chosen->comm, $oc->totalpages);
 cat("/proc/loadavg");
}
[root@liruilongs.github.io ~]#

通过动态插桩内核函数 oom_kill_process()，捕获 OOM Killer 触发事件,同时输出了一些其他的指标信息

自定义 OOM Killer 发生时的性能指标采集

下面是最上面脚本的基础上添加的一些他的指标数据采集，从而实现在 OOM Killer 发生时快速的定位问题

添加 /proc/meminfo 的全局内存指标信息，meminfo提供了系统范围内内存统计数据的超集，包括了vmstat、top、free和procinfo的信息

#include <linux/oom.h>

BEGIN
{
        printf("Tracing oom_kill_process()... Hit Ctrl-C to end.\n");
}

kprobe:oom_kill_process
{
        $oc = (struct oom_control *)arg0;
        $task = $oc->chosen;
        time("%H:%M:%S ");
        printf("Triggered by PID %d (\"%s\"), ", pid, comm);
        printf("OOM kill of PID %d (\"%s\"), %d pages, loadavg: ",
            $oc->chosen->pid, $oc->chosen->comm, $oc->totalpages);
        cat("/proc/loadavg");
        print("当前系统内存性能统计信息：");
        cat("/proc/meminfo");
}

修改脚本后的再次监控指标采集

[root@developer tools]# vim oomkill.bt
[root@developer tools]# ./oomkill.bt
Attaching 2 probes...
Tracing oom_kill_process()... Hit Ctrl-C to end.
19:06:46 Triggered by PID 1039 ("oeaware"), OOM kill of PID 1528049 ("stress-ng"), 1704429 pages, loadavg: 4.14 2.26 1.35 5/405 1528051
当前系统内存性能统计信息：
MemTotal:        6817716 kB
MemFree:         1494936 kB
MemAvailable:    1442388 kB
Buffers:             372 kB
Cached:           155720 kB
SwapCached:            0 kB
Active:            31476 kB
Inactive:        5156300 kB
Active(anon):      31388 kB
Inactive(anon):  5117004 kB
Active(file):         88 kB
Inactive(file):    39296 kB
Unevictable:        6236 kB
Mlocked:              76 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:       5038288 kB
Mapped:            21148 kB
Shmem:            116328 kB
KReclaimable:      21692 kB
Slab:              66816 kB
SReclaimable:      21692 kB
SUnreclaim:        45124 kB
KernelStack:        6528 kB
PageTables:        22160 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     3408856 kB
Committed_AS:   12252200 kB
VmallocTotal:   135290159040 kB
VmallocUsed:       16152 kB
VmallocChunk:          0 kB
Percpu:             2800 kB
HardwareCorrupted:     0 kB
AnonHugePages:   4009984 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB

对于内核态的内存分配，可以添加 /proc/slabinfo 相关指标的采集

上面是全局的内存信息的指标采集，当然也可以采集对被 kill 进程以及创建的进程的相关的指标信息

下面的为提供的 BCC 版本的 oomkill 工具

[root@liruilongs.github.io ~]# cat /usr/share/bcc/tools/oomkill 
#!/usr/bin/python3
#
# oomkill   Trace oom_kill_process(). For Linux, uses BCC, eBPF.
#
# This traces the kernel out-of-memory killer, and prints basic details,
# including the system load averages. This can provide more context on the
# system state at the time of OOM: was it getting busier or steady, based
# on the load averages? This tool may also be useful to customize for
# investigations; for example, by adding other task_struct details at the time
# of OOM.
#
# Copyright 2016 Netflix, Inc.
# Licensed under the Apache License, Version 2.0 (the "License")
#
# 09-Feb-2016   Brendan Gregg   Created this.

from bpfcc import BPF
from time import strftime

# linux stats
loadavg = "/proc/loadavg"

# define BPF program
bpf_text = """
#include <uapi/linux/ptrace.h>
#include <linux/oom.h>

struct data_t {
    u32 fpid;
    u32 tpid;
    u64 pages;
    char fcomm[TASK_COMM_LEN];
    char tcomm[TASK_COMM_LEN];
};

BPF_PERF_OUTPUT(events);

void kprobe__oom_kill_process(struct pt_regs *ctx, struct oom_control *oc, const char *message)
{
    unsigned long totalpages;
    struct task_struct *p = oc->chosen;
    struct data_t data = {};
    u32 pid = bpf_get_current_pid_tgid() >> 32;
    data.fpid = pid;
    data.tpid = p->pid;
    data.pages = oc->totalpages;
    bpf_get_current_comm(&data.fcomm, sizeof(data.fcomm));
    bpf_probe_read_kernel(&data.tcomm, sizeof(data.tcomm), p->comm);
    events.perf_submit(ctx, &data, sizeof(data));
}
"""

# process event
def print_event(cpu, data, size):
    event = b["events"].event(data)
    with open(loadavg) as stats:
        avgline = stats.read().rstrip()
    print(("%s Triggered by PID %d (\"%s\"), OOM kill of PID %d (\"%s\")"
        ", %d pages, loadavg: %s") % (strftime("%H:%M:%S"), event.fpid,
        event.fcomm.decode('utf-8', 'replace'), event.tpid,
        event.tcomm.decode('utf-8', 'replace'), event.pages, avgline))

# initialize BPF
b = BPF(text=bpf_text)
print("Tracing OOM kills... Ctrl-C to stop.")
b["events"].open_perf_buffer(print_event)
while 1:
    try:
        b.perf_buffer_poll()
    except KeyboardInterrupt:
        exit()
[root@liruilongs.github.io ~]#

我们对这个工具做一些简单的修改, /proc/pid/status 用于展示当前进程的一些基本指标

# process event
def print_event(cpu, data, size):
    event = b["events"].event(data)
    with open(loadavg) as stats:
        avgline = stats.read().rstrip()
    with open("/proc/"+ str(event.fpid) +"/status" ) as statm:
        statmtable = statm.read().rstrip()
    print(("%s Triggered by PID %d (\"%s\"), OOM kill of PID %d (\"%s\")"
        ", %d pages, loadavg: %s") % (strftime("%H:%M:%S"), event.fpid,
        event.fcomm.decode('utf-8', 'replace'), event.tpid,
        event.tcomm.decode('utf-8', 'replace'), event.pages, avgline))
    print("新进程指标信息: &s",statmtable)

下面为输出结果

[root@developer tools]# ./oomkill
Tracing OOM kills... Ctrl-C to stop.
19:42:24 Triggered by PID 1539774 ("stress-ng"), OOM kill of PID 1539775 ("stress-ng"), 1704429 pages, loadavg: 2.40 1.72 1.07 5/407 1539778
新进程指标信息: &s Name:	stress-ng
Umask:	0077
State:	R (running)
Tgid:	1539774
Ngid:	0
Pid:	1539774
PPid:	1539770
TracerPid:	0
Uid:	0	0	0	0
Gid:	0	0	0	0
FDSize:	64
Groups:	0 
NStgid:	1539774
NSpid:	1539774
NSpgid:	1539769
NSsid:	32074
VmPeak:	 2410404 kB
VmSize:	 2410404 kB
VmLck:	       0 kB
VmPin:	       0 kB
VmHWM:	 1415824 kB
VmRSS:	 1415824 kB
RssAnon:	 1415760 kB
RssFile:	       4 kB
RssShmem:	      60 kB
VmData:	 2369388 kB
VmStk:	     132 kB
VmExe:	    1416 kB
VmLib:	    2888 kB
VmPTE:	    2828 kB
VmSwap:	       0 kB
HugetlbPages:	       0 kB
CoreDumping:	0
THP_enabled:	1
Threads:	1
SigQ:	0/26524
SigPnd:	0000000000000000
ShdPnd:	0000000000000000
SigBlk:	0000000000000000
SigIgn:	0000000008300a00
SigCgt:	000000002380e0af
CapInh:	0000000000000000
CapPrm:	000001ffffffffff
CapEff:	000001ffffffffff
CapBnd:	000001ffffffffff
CapAmb:	0000000000000000
NoNewPrivs:	0
Seccomp:	0
Seccomp_filters:	0
Speculation_Store_Bypass:	vulnerable
Cpus_allowed:	f
Cpus_allowed_list:	0-3
Mems_allowed:	00000000,00000000,00000000,00000001
Mems_allowed_list:	0
voluntary_ctxt_switches:	32
nonvoluntary_ctxt_switches:	837
Cpus_preferred:	0
Cpus_preferred_list:

我们可以同时采集到 VmRSS 等有用的指标信息。

BPF 跟踪工具可以给各种内存行为提供更多的信息，可以用 BPF 跟踪软件事件及系统调用和缺页错误相关的跟踪点来分析；还可以使用 kprobes 跟踪内核中内存分配的函数;或使用 uprobes 来跟踪库函数、应用程序运行时，以及应用程序自带的内存分配器；或使用 USDT 探针来跟踪 libc 内存分配器事件;以及使用 PMC对内存访问进行溢出采样。感兴趣的小伙伴可以深入了解。

博文部分内容参考

《BPF Performance Tools》

如何使用 eBPF 监控 Linux 内存 OOM killer：Linux 内存调优之 eBPF 监控内存 OOM killer 事件

https://liruilongs.github.io/2025/04/10/待发布/如何使用 eBPF 监控 Linux 内存 OOM killer：Linux 内存调优之 eBPF 监控内存 OOM killer 事件/

作者

山河已无恙

发布于

2025-04-10

更新于

2025-05-19

如何使用 eBPF 监控 Linux 内存 OOM killer：Linux 内存调优之 eBPF 监控内存 OOM killer 事件

写在前面

传统的 OOM Killer 内存事件监控

使用 BPF 的方式

自定义 OOM Killer 发生时的性能指标采集

博文部分内容参考

作者

发布于

更新于

许可协议

喜欢这篇文章？打赏一下作者吧

目录

链接

最新评论

最新文章

分类

归档

标签

订阅更新

Your browser is out-of-date!